Epitope prediction improved by multitask support vector machines
نویسندگان
چکیده
Motivation: In silico methods for the prediction of antigenic peptides binding to MHC class I molecules play an increasingly important role in the identification of T-cell epitopes. Statistical and machine learning methods, in particular, are widely used to score candidate epitopes based on their similarity with known epitopes and non epitopes. The genes coding for the MHC molecules, however, are highly polymorphic, and statistical methods have difficulties to build models for alleles with few known epitopes. In this case, recent works have demonstrated the utility of leveraging information across alleles to improve the performance of the prediction. Results: We design a support vector machine algorithm that is able to learn epitope models for all alleles simultaneously, by sharing information across similar alleles. The sharing of information across alleles is controlled by a user-defined measure of similarity between alleles. We show that this similarity can be defined in terms of supertypes, or more directly by comparing key residues known to play a role in the peptide-MHC binding. We illustrate the potential of this approach on various benchmark experiments where it outperforms other state-of-the-art methods.
منابع مشابه
A Comparative Study of Extreme Learning Machines and Support Vector Machines in Prediction of Sediment Transport in Open Channels
The limiting velocity in open channels to prevent long-term sedimentation is predicted in this paper using a powerful soft computing technique known as Extreme Learning Machines (ELM). The ELM is a single Layer Feed-forward Neural Network (SLFNN) with a high level of training speed. The dimensionless parameter of limiting velocity which is known as the densimetric Froude number (Fr) is predicte...
متن کاملMultitask centroid twin support vector machines
Twin support vector machines are a recently proposed learning method for binary classification. They learn two hyperplanes rather than one as in conventional support vector machines and often bring performance improvements. However, an inherent shortage of twin support vector machines is that the resultant hyperplanes are very sensitive to outliers in data. In this paper, we propose centroid tw...
متن کاملMultitask Sparsity via Maximum Entropy Discrimination
A multitask learning framework is developed for discriminative classification and regression where multiple large-margin linear classifiers are estimated for different prediction problems. These classifiers operate in a common input space but are coupled as they recover an unknown shared representation. A maximum entropy discrimination (MED) framework is used to derive the multitask algorithm w...
متن کاملSeparating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir
The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...
متن کاملA prediction distribution of atmospheric pollutants using support vector machines, discriminant analysis and mapping tools (Case study: Tunisia)
Monitoring and controlling air quality parameters form an important subject of atmospheric and environmental research today due to the health impacts caused by the different pollutants present in the urban areas. The support vector machine (SVM), as a supervised learning analysis method, is considered an effective statistical tool for the prediction and analysis of air quality. The work present...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007